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The development plan of the No. 4 ess included provisions for 
measuring the effectiveness of the design, operation, maintenance, 
and administration of the total system. This paper reviews system 
performance from 1976 to 1980, describes principal factors affecting 
system performance, and presents the service experience measured 
for the No. 4 ess. Steady improvement has been measured in the 
number of service -affecting incidents experienced per office each 
month. This improvement is also reflected in the rate of cutoff and 
denied calls, as well as in system "no call processing" time. We 
discuss some of the factors influencing this performance record, e.g., 
a sound initial design, reliable hardware, effective maintenance and 
repair tools, continuing analysis and resolution of causes of service- 
affecting incidents, and continuing development of new features for 
performance improvement. 

I. INTRODUCTION 

The No. 4 ess is a digital time-division toll and tandem switching 
system first placed in service in Chicago. It was described in the Bell 
System Technical Journal in 1976. 1 Since then, 51 offices with over 
1,000,000 terminations have been put into service. The deployment 
progress is shown in Fig. 1. The average size of the No. 4 ess is 22,000 
terminations with current office sizes ranging from 6,000 to over 60,000 
terminations. Detailed statistics demonstrate that the No. 4 ess pro- 
vides high-quality service to its customers and that its performance 
continues to improve as the system matures despite office growth, new 
generic programs, and evolving hardware. 
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Fig. 1 — No. 4 ess deployment. 

Substantial effort has been applied to developing methods and 
procedures for evaluating the performance of hardware and software 
in the No. 4 ess. Data collection on performance parameters was built 
into the initial design so that performance data from many No. 4 ess 
systems could be obtained easily and accurately. New performance 
criteria have been developed to measure the effect on the customer 
and to provide data for maintaining the hardware and software. 

A typical No. 4 ess has intertoll and toll connecting trunks to about 
200 other switching entities. Therefore, because of its size and position 
in the network, its continuous availability for service is needed since 
any malfunction can affect communication in many areas of the 
country. All No. 4 ess machines are staffed 24 hours a day, 7 days a 
week, and all service-affecting incidents are reported and analyzed. 
Special attention is given to correct the causes of service-affecting 
incidents. 

This paper describes some of the major system objectives, specific 
reliability and maintainability objectives, operational factors affecting 
performance, service experience, and methods used to manage per- 
formance. References 1 through 8 provide additional information on 
system performance. 

II. SYSTEM OBJECTIVES 

The traditional measure of telephone switching system reliability 
and performance is the amount of "no call processing" time in 40 
system years. This measure is a useful design objective, but it does not 
include all of the effects of complete and partial system failures which 
can lead to unsatisfactory performance from a customer viewpoint. 

The primary objective is to minimize the impact on the customer of 
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all types of system failures. Consequently, cutoff calls and denied calls 
are among the most important performance indicators measured in 
the No. 4 ess. Many other performance indicators are also measured 
to determine the effectiveness of maintenance and operation so that 
procedures and design problems can be corrected promptly. 

As an example, the derivation of the cutoff call objective for a toll 
call is shown in Fig. 2. Calls are assumed to pass through two local 
offices, two toll offices, and interconnecting transmission facilities. As 
shown, the overall call cutoff objective is less than or equal to 15 calls 
per 10,000, with an allocation to each switching entity of less than or 
equal to 1.25 calls per 10,000. 

Special performance criteria were set for the cutover of the first No. 
4 ess in Chicago in 1976 (referred to as Chicago 7). They were 
expressed both as objectives and concern thresholds. 1 Table I lists the 
objectives. 

Performance objectives have also been set for other performance 
indicators where supporting information is available. However, some 
performance measures are new, and the present, self-imposed, objec- 
tives are based on data obtained from typical No. 4 ess offices and 
were not part of the original design objectives. The new objectives are 
described later in this paper. 

The design of reliable telephone switching systems involves built-in 
tools to measure performance, as well as reliable hardware, software, 
and equipment configurations. Objectives must be set that are strin- 
gent, yet attainable at a reasonable cost. Objectives for the No. 4 ess 
performance are based on a reliability model and field data from the 
existing network. Advances in technology and the expectations of the 
public are also considered in setting objectives. 

The ultimate performance of telephone switching systems depends 
on design, as well as installation, operation, and maintenance. Conse- 
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Fig. 2— Allocation of cutoff calls objective in calls per 10,000. The total cutoff call 
objective is less than or equal to 15 calls per 10,000. 
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Table I — Chicago 7 cutover objectives 

Description Objectives 

Ineffective Attempts <1.25 percent 

Plug-in Replacements <2 per day 

Interrupts <50 per day 

Phases (2 or higher) <l/2 per month 



quently, standards have been developed for final installation accept- 
ance tests, daily equipment performance, and routine maintenance 
procedures. 

III. RELIABILITY AND MAINTAINABILITY OBJECTIVES 

A primary architectural feature of the No. 4 ess is the system 
organization and design which provides dependable hardware and a 
software structure that can be operated and maintained by craft 
personnel. These objectives have been accomplished by using reliable 
circuitry and hardware redundancy with extensive supporting soft- 
ware. 

The software design provides centralized maintenance control from 
the 1A Processor. The processor and the peripheral equipment have 
configurable redundancy, which is accomplished automatically without 
affecting service. An automatic backup for the processor semiconduc- 
tor memory is provided by the disk system, which in turn has a 
magnetic tape system backup. A detailed description of circuit relia- 
bility and system redundancy can be found in Ref. 2. 

3.1 Reliability 

The basic element of a reliable system is well-designed hardware 
that includes trouble-detection features and ease of component re- 
placement. The development of the No. 4 ess is based on a gold metal 
system for semiconductors and their interconnection. The connector 
contacts are also gold plated. The basic design features include open- 
frame convection cooling (rather than fan cooling) and the ability to 
operate in a temperature range of 30°F to 120°F. The hardware is 
designed to make per-frame checks and depends on a centralized 
software maintenance system to automatically reconfigure the hard- 
ware in case of trouble, to diagnose the frame reporting irregularities, 
and to locate the faulty component so it can be replaced by mainte- 
nance personnel. 

A reliability model was developed for the No. 4 ess to help translate 
service objectives into a redundancy plan and to predict long-term 
performance. The No. 4 ess reliability model specifies a number of 
hardware failure modes, determines their impact on performance, and 
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predicts their likelihood of occurrence. 2 The model was derived prin- 
cipally through analysis of predicted hardware failure rates, system 
hardware configurations, and predicted repair times. However, the 
model did not attempt to directly account for the following factors: 
(i) procedural errors, 
(») change activity, 

{Hi) growth, 
(iu) retrofits, 
(v) routine exercise, 
(vi) software deficiencies, and 

(vii) hardware design deficiencies. 

Instead, the hardware failure rates predicted by the model were 
scaled to account for procedural errors and software errors based on 
experience gained from previous systems. No provision was made for 
generic program retrofits since their frequency is determined by the 
rate of new feature introduction in each office, which was unknown at 
that time. 

The hardware reliability of the overall system is a function of its 
size, hardware failure rates, redundancy plan, and mean repair times. 
Data taken over a 4-year period show that the predicted hardware 
failure rates essentially have been met. Special repair studies have 
been conducted which show that the mean time to repair solid faults 
is 1.25 hours while the mean time to repair intermittent faults is 20.5 
hours. As shown in Fig. 8, component failures cause only 11 percent of 
the service-affecting incidents. 

3.2 Maintainability 

The No. 4 ess is designed to perform extensive maintenance func- 
tions automatically so that, problems are rapidly corrected and per- 
sonnel costs are minimized. The initial design provided work centers 
at each office for maintenance and administration. Experience has 
shown that centralized maintenance and administration for up to six 
No. 4 esss is possible. 

Switching Management Control Centers (smccs) have been imple- 
mented to centralize the maintenance functions. This has led to the 
centralization of expertise, reduction of total maintenance personnel, 
and improved system performance. Additional centralization of Ma- 
chine Administration Centers and Trunk Operations Centers is 
planned for the future. 3 

Current field experience demonstrates that the basic system design 
is highly reliable and that craft-level personnel can maintain the 
system. Hardware displays, software support tools, and new mainte- 
nance documentation (task-oriented practices) have contributed sig- 
nificantly to the performance of the maintenance personnel. 
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IV. OPERATIONAL FACTORS AFFECTING PERFORMANCE 

The principal operational factors affecting performance of a No. 4 
ess are the change and repair activities and some of the environmental 
factors that can affect No. 4 ess service. Taken together, they represent 
a high level of activity in many offices. Section V presents performance 
statistics which include the service impact of these activities. 

4. 1 Variety of system configurations 

One significant factor is the variety of configurations of the No. 4 
ess. Each installation is engineered to match the service requirements 
of a particular location; therefore, each office is different. This implies 
that fault recognition and system recovery programs must be able to 
operate with any of the possible equipment configurations. 

4.2 Evolution of equipment 

As mentioned earlier, the equipment comprising the No. 4 ess has 
evolved rapidly and many early offices have added each new type of 
equipment as it became available. The result is a mixture of vintages 
of equipment, complicating the environment in which system integrity 
and fault recognition programs must operate. An example is the first 
No. 4 ess office, Chicago 7. It has a mixture of core, small (64K) 
semiconductor and large (256K) semiconductor memory frames. Sim- 
ilarly, in its time-division network, Chicago 7 has original vintage Time 
Slot Interchange (tsi) frames, a cost-reduced vintage of tsi frames, 
and the present version called the tsi-B. 

Virtually every type of equipment has evolved to incorporate new 
technology since initial introduction: the Digroup Terminal (dt) has 
been cost reduced and replaced with the Digital Interface Frame (dif), 
the Signal Processor was replaced with the Signal Processor 2 and 
eventually its signal processing function was incorporated into the dif, 
Common-Channel Interoffice Signaling (ccis) terminals have been 
improved, and the common control echo suppressor was added and 
will be superceded by per-trunk echo cancelers. Figure 3 gives a more 
complete picture of the evolution of No. 4 ess equipment. 

4.3 Growth activity 

The rate of growth additions to existing No. 4 ess systems has 
increased steadily. Figure 4 shows the number of major growth jobs in 
progress and the number of new No. 4 ess offices placed in service 
during each year since 1976. Nearly two-thirds of the operational No. 
4 ess systems have been expanded with growth jobs. Through the end 
of 1979, growth activity had added over 900 frames of ess equipment 
and provided over 350,000 new terminations, or nearly one-third of all 
installed No. 4 ess terminations. Several offices have been expanded 
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several times, sometimes with additions providing 30,000 termina- 
tions. 4 

The growth process has been designed so equipment can be added 
without affecting service. However, growth and related activities have 
been responsible for approximately 5 percent of the service-affecting 
incidents (see Section 5.2) in the No. 4 ess. The principal causes of 
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Fig. 4— Trend in No. 4 ess growth activity. 



these incidents have been human error, system software problems, and 
equipment failure in some of the new equipment shortly after it was 
made operational. Some of the improvements made in the growth 
process have been to incorporate temperature stress tests and extra 
network transmission path checks into selected growth procedures to 
improve the reliability of the new equipment once it is made opera- 
tional. 

4.4 Hardware change activity 

Over 400 Change Notices (cns) have been prepared by Western 
Electric to implement hardware changes in No. 4 ess equipment. The 
scope of cns includes wiring changes, circuit pack changes (including 
firmware updates), documentation, and addition of new types of equip- 
ment to existing frames, cns may be stimulated by design changes 
initiated by Bell Laboratories or by the discovery of manufacturability 
problems discovered by Western Electric. All hardware changes are 
authorized and monitored by the No. 4 ess hardware change commit- 
tee. The Western Electric Product Engineering Control Center (pecc) 
tracks application of cns in the field. 

4.5 Software change activity 

Software problems account for 25 percent of all No. 4 ess service- 
affecting incidents. These are problems not detected in laboratory 
system tests or in first application office field tests. Such problems 
may go undetected until the generic program is introduced into an 
office with a particular configuration. Some software problems are 
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caused by incomplete defensive checks and are only stimulated 
through combinations of failures; others are simply design errors. 

Table II shows the size of the No. 4 ess program with the introduc- 
tion of each new version. The numbers of problems corrected after the 
generic was placed in service are also shown. Although the quality of 
each generic issue is improving, as demonstrated by the decreasing 
number of service affecting incidents per office (Fig. 7), the number of 
field problems fixed has increased for each generic. This is a result of 
greater exposure to different office configurations and the contribution 
of undiscovered problems carried forward from previous generics. 
Generally, these software changes are of two types: the relatively few 
urgent fixes are called out to all offices or transmitted by the Software 
Change Administration and Notification System and installed with 
generic utility overwrites; the remainder are installed only when a 
partial update is distributed to each office. A partial update is a 
technique for introducing large numbers of program corrections with- 
out affecting service. Figure 5 shows a plot of the problems identified, 
fixes under test, and overwrites delivered to the field for the 4E4 
generic. 

One of the major reasons the No. 4 ess has provided excellent 
service, despite the existence of software problems, is its basic system 
architecture and software integrity design. It is not technically or 
economically feasible to detect and fix all software problems in a 
system as large as the No. 4 ess. Consequently, a strong emphasis has 
been placed on making it sufficiently tolerant of software errors to 
provide successful operation and fault recovery in an environment 
containing software problems. 

Another type of software change activity involves the office data 
base which includes translations, parameters, trunking, and routing 
information. Occasionally, corrections and changes are made to the 
office data base with standard recent change methods and also with 
generic utility system overwrites. 

4.6 Retrofits 

A major type of software change activity is a generic retrofit in 

Table II — Field problems 
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Problems fixed in field as of August 26, 1980. 
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which each No. 4 ess replaces its current generic program with the 
latest generic. Current plans call for each office to receive the new 
generic within a year of its official release. Figure 6 shows the number 
of retrofits each year since 1976, indicating a large increase as new 
offices have been added. 

A new office data base is compiled for each retrofit. The data base 
is expanded in anticipation of future growth and also includes a 
recompiled description of the current office data. Other types of 
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software changes generally made during retrofits, and also once be- 
tween them (as "midgeneric releases"), are the introduction of new 
network management software, new trouble-locating procedure tapes 
that help office maintenance personnel locate faulty circuit packs when 
diagnostic tests indicate trouble, and new library programs that contain 
infrequently used test and administrative programs. 

4.7 Rearrangements 

In addition to hardware changes, software changes, growth and 
retrofit activity, office performance can also be affected by major 
rearrangements. Three principal kinds of rearrangements have oc- 
curred. Common-Channel Interoffice Signaling terminals in the first 
28 offices are being rearranged to improve system reliability. This 
involved growth of new terminals and execution of a special library 
program to modify 12 translators in the office data base to effect a new 
terminal pairing arrangement. The second major rearrangement was 
a series of activities to allow one office to serve as a gateway office, a 
function normally planned when an office is first installed. The third 
type of rearrangement performed was to change the pulse point control 
for large numbers of frames in one office to increase its reliability. 

4.8 Repair 

Equipment fails and requires repair on an ongoing basis in No. 4 ess 
offices. The average circuit-pack replacement rate for the first quarter 
of 1980 was 1.7 per day per office. This is half the rate experienced 
during the first 122 days of Chicago 7 operation in 1976, and it meets 
the short-term objective of less than two per day per office. 1 To place 
this number in perspective, a typical No. 4 ess contains 50,000 circuit 
packs. In a small fraction of cases, office technicians must use oscillo- 
scopes and probe communication buses and backplane wiring to isolate 
equipment faults. Such routine repair of equipment often involves 
several steps, and human error in performing them accounts for 18 
percent of the service-affecting incidents. 

4.9 Other factors 

Although No. 4 ess offices are well-protected from most external 
factors, some have had an impact on service. In particular, some offices 
have been affected by air-conditioning problems, power-distribution 
failures, failure of non-No. 4 ess equipment, and static discharge. 

V. SERVICE EXPERIENCE 
5. 1 Service-affecting incidents 

To track the performance of No. 4 ess, the notion of a service- 
affecting incident (or simply, incident) has been defined as those 
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equipment failures and major system recovery actions with a signifi- 
cant effect on the customer. Specifically, they include: 
(i) Hardware failures affecting more than 360 trunks. 
(ii) System recovery directed Phase 1 and Phases 2, 3, and 4. 
(Hi) System reinitializations. 

5.1.1 Hardware failures 

Hardware failures affecting more than 360 trunks are called Multiple 
Unit Failures (mufs). Originally, mufs represented half the trunks 
served by a Voiceband Interface Frame (vif). With the addition of 
frames, such as the dif serving up to 3840 trunks, a muf is now defined 
as an outage affecting more than 360 trunks. In duplicated equipment, 
duplex failures and/or restoral from them, also cause the recovery 
actions described in Section 5.1.2. 

5.1.2 System recovery 

When the No. 4 ess must halt call processing to recover from 
problems, the result is called a system recovery phase. In a directed 
Phase 1, all calls associated with a duplex-failed peripheral frame are 
lost; however, the other stable calls in the system are saved. A directed 
Phase 1 can have a duration from 1 to 15 seconds. 

A Phase 2 is used to recover from memory mutilation or peripheral 
configuration problems. It checks the integrity of fixed data, such as 
program store with a hashing algorithm, reconfigures the peripheral 
complex with peripheral bootstrap (when F-level interrupts implicate 
the periphery), and initializes most of the call store memory spectrum 
that is not related to stable calls. A Phase 2 saves stable calls and 
requires less than 30 seconds if peripheral recovery is not required, 
and less than 60 seconds if it is. Calls in the dialing state are lost during 
a Phase 2. 

A Phase 3 is used when a complete processor or peripheral recon- 
figuration is required. It lasts from 1 to 4 minutes, depending on office 
size, and saves stable calls. Calls in the dialing state are lost, as in a 
Phase 2. 

A Phase 4 is similar to a Phase 3, but it is initiated manually and 
disconnects all calls. 

5.1.3 System reinitialization 

A System Reinitialization is a complete reload of the generic pro- 
gram from magnetic tape. It is required only under the most severe 
cases in which data in program store and both file stores are mutilated. 
It can take up to 20 minutes and it disconnects all calls. 

5.1.4 Number of incidents 

When several recovery phases or mufs are stimulated by the same 
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event, or follow in succession, they are considered a single incident. All 
No. 4 ess service-affecting incidents are recorded and analyzed. The 
record of these incidents provides an extremely valuable method for 
evaluating system performance and for guiding efforts to improve it. 

Figure 7 is a graph of the number of service-affecting incidents per 
office per month. The trend indicated is a reduction in the average 
number experienced by an office to 1.4 per month during the first 
quarter of 1980. It is significant that a high fraction of service-affecting 
incidents occur in low traffic periods. Over 55 percent of the "no call 
processing" time (see Section 5.3.1) has occurred between midnight 
and 8:00 a.m. to a great extent because routine exercise, complicated 
repairs, change installation, growth activity, retrofits, and other activ- 
ities with high risk are generally scheduled during the periods of lowest 
traffic. 

Each service-affecting incident is classified into one or more of the 
categories shown in Fig. 8. Software design problems account for 25 
percent of the total causes. These problems form the basis of an 
investigation list that is used to guide software current engineering 
effort. The expected category comprises 16 percent of the incidents. 
These are cases in which the system reacted as expected, such as 
planned retrofits, intentional test phases, or when it is impossible to 
resolve a problem to the proper unit of a duplicated pair and the 
system must randomly choose the unit to be removed. Duplex frame 
failures are incidents that occur because a frame is simplex for repair 
and a fault develops in the active controller. They comprise 11 percent 
of the total. Unresolved incidents are 13 percent for which sufficient 
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Fig. 8 — Causes of service-affecting incidents, cumulative through March 31, 1980. (a) 
Percent of incidents, (b) Percent of no call processing time. 

data to thoroughly analyze the source of the incident is unavailable. 
Hardware design incidents are the 4 percent caused by the hardware 
design of a particular frame or subunit. Hardware design problems are 
considered by the No. 4 ess hardware change committee and fixes are 
scheduled as appropriate. Wiring errors account for 8 percent of the 
incidents and include wiring breaks or loss of insulation integrity as 
well as errors or wire clippings inadvertently left in equipment when 
it was being repaired or modified. The technician error category 
includes operating telephone company craft and Western Electric 
installer errors, and comprise 18 percent of the total. Figure 8 also 
shows the causes for service-affecting incidents by their contribution 
to system no call processing time. 

5.2 Customer impact 

The principal No. 4 ess performance measures are those that show 
the impact of service-affecting incidents on the customer: cutoff calls 
and denied calls. 

Figure 9 shows the rate of calls cutoff by the No. 4 ess. The first 
quarter, 1980, rate was 0.18 per 10,000 calls, well below the objective 
of 1.25 per 10,000. Denied calls are the measure of the No. 4 ess 
contribution to the customer's ability to complete calls on demand due 
to no call processing time. During the first quarter, 1980, the rate of 
denied calls was 0.28 per 10,000. The trend in the number of calls 
denied by the No. 4 ess is shown in Fig. 10. The effect on the customer 
of denied calls is difficult to measure, since alternate routing strategies 
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Fig. 9— Cutoff calls. The first quarter 1980 rate was 0.18 per 10,000 calls, well below 
the objective, which was 1.25 per 10,000 calls. 

elsewhere in the network can compensate for some No. 4 ess denied 
calls, often allowing the customer to complete the intended call. Both 
measures show substantial improvement over the period of time the 
No. 4 ess has been deployed. 

5.3 System performance 
In addition to cutoff and denied calls, other performance factors are 
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Fig. 10— Denied calls. In the first quarter of 1980, the rate of denied calls was 0.28 per 
10,000 calls. 
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also used to give a more comprehensive measure of system perform- 
ance. They are system- rather than customer-related measures of 
system performance and include: 
(i) no call processing time, 
(ii) trunk outage time, and 
{Hi) Ineffective Machine Attempts (ima). 

5.3.1 ' 'No call processing ' ' time 

No call processing time is often expressed in terms of hours of time 
in 40 years. It includes outage time required for system reinitialization 
such as Phases 2, 3, and 4 and directed Phase 1 recovery actions. Note 
that during the No. 4 ess no call processing time caused by Phase 2 
and Phase 3, all stable calls continue, unless there is also a duplex 
failure of network or network interface equipment. Figure 11 illustrates 
that the long-term trend has been an improvement in "no call proc- 
essing" time to a first quarter, 1980, rate of 9.9 hours in 40 years. Since 
generic retrofits and data base updates require use of an intentional 
Phase 3 during the lowest traffic periods, there is a built-in requirement 
that approximately 1 hour in 40 years of this total be used for this 
purpose. Customer impact is minimal because network management 
controls applied as part of the retrofit procedure virtually eliminate 
any customer impact. The rate of 9.9 hours in 40 years is comprised of 
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all of the factors shown in Figure 8. It is significant that factors, such 
as procedural errors and software deficiencies, that could not be 
specifically modeled (see Section III), account for nearly two-thirds of 
all downtime. Consequently, the internal objective of 2 hours in 40 
years of total system unavailability is under review. Nevertheless, no 
call processing time has steadily improved as maintenance and relia- 
bility enhancements have been added to the system. 

Figure 12 shows the effect of two recent enhancements. It presents 
overlapping histograms showing the distribution of no call processing 
incidents for two 6-month periods, one ending on March 31, 1979, and 
another ending 1 year later. The significance of the first histogram is 
that is represents No. 4 ess performance before the directed Phase 1 
feature was available. The directed Phase 1 was introduced in the 4E4 
generic program and has been deployed both in new offices and 
through generic program retrofits. By March 31, 1980, all offices had 
the directed Phase 1 feature. Normally, the directed Phase 1 takes 
about 2 seconds to initialize a duplex-failed tsi frame. Prior to the 
directed Phase 1, a 1- to 3-minute Phase 3 was required. The signifi- 
cance of the second histogram is that the directed Phase 1 shifted the 
distribution so that 34 percent of all no call processing incidents require 
less than 30 seconds as compared with 2 percent prior to directed 
Phase 1. An additional enhancement, introduced late in 1979, was a 
shortened Phase 2 when no peripheral equipment was suspected by 
system integrity programs. This also reduces the no call processing 
time. 

5.3.2 Trunk outage time 

Trunk outage time is the measure of hardware failures, such as 
duplex-failed equipment or mufs. Note that no call processing time is 
not included in trunk outage time measurements. Figure 13 shows a 
graph of No. 4 ess trunk outage time. During the first quarter of 1980, 
the system performance was 38.0 minutes of outage per trunk per year 
compared with an objective of 28.0. Several maintenance enhance- 
ments are planned to help bring No. 4 ess performance closer to this 
objective. 

5.3.3 Ineffective machine attempts 

Some customer attempts to originate calls result in noncompleted 
calls. The No. 4 ess has a large and precise ineffective-attempt report- 
ing system that measures call failure statistics and allows an analysis 
of chronic problems. Over 300 call-failure modes are defined, including 
customer errors, failure of switching machines or transmission media 
connected in an incoming mode to the No. 4 ess, failure of the No. 4 
ess to establish a cross-office connection, and a failure of the switching 
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Fig. 12 — Incident duration for two six-month intervals that show the impact of the 
directed Phase 1 and shortened Phase 2. 



machine or transmission media connected in an outgoing mode to the 
No. 4 ess. A subset of the total ineffective attempts is classified as an 
IMA. These include calls that must be terminated with incoming, 
connecting or outgoing reorder tone, vacant code announcements, or 
no-circuit tone. 

Figure 14 shows that the average adjusted domestic ima performance 
has remained relatively constant at a little over 1 percent of all 
attempts. The rate during the first quarter of 1980 was 1.02 percent, 




1Q 2Q 3Q 40. 10 2Q 3Q 4Q 1Q 2Q 3Q 4Q 1Q 2Q 3Q 4Q 1Q 2Q 
1976 1977 1978 1979 1980 

Fig. 13 — Trunk minutes out of service. For the first quarter of 1980 the system 
performance was 38.0 minutes of outage per trunk per year. The objective was 28.0. 
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Fig. 14 — Ineffective machine attempts. The first quarter of 1980 had a rate of 1.02 
percent. The original objective was 1.25 percent. 

meeting the original objective of 1.25 percent. The rate for calls to 
other countries is higher. A study of the specific failures shows that 
the No. 4 ess and outgoing trunks contribute to less than 0.01 percent 
of the total number of imas. Most failures originate from irregularities 
in the incoming network. Further analysis shows that in large metro- 
politan systems, such as those in Chicago and New York City where 
common control Class 5 offices with multifrequency signaling or ccis 
are used, the reorder component of ima for domestic calls ranges 
between only 0.2 to 0.3 percent. However, where step-by-step or early 
vintage crossbar switches are used, the reorder ima ranges between 2 
and 3 percent even though the equipment is properly maintained. This 
can frequently be attributed largely to outside plant problems not 
screened by these systems. The ima data are effective in identifying 
network problems, and also serve as a continuous check on network 
performance. 

5.4 Interrupts 

One of the most closely watched system maintenance indicators in 
No. 4 ess is the level of system interrupts. They generally indicate an 
unexpected response from a system action. For example, an equipment 
failure that affects a path through the time-division network may 
cause interrupts. (For a more complete description of system inter- 
rupts, see Refs. 2 and 5.) 

Although interrupts do not directly affect the customer, an objective 
has been set to help manage system maintenance activity. When the 
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interrupt level rises, more attention needs to be spent on maintenance. 
The original empirical interrupt objective of less than 50 per day has 
been tightened to an average of less than 40 per day. Some small 
offices have an objective that is more stringent since they have less 
equipment. The average number of interrupts per office during the 
first quarter of 1980 was less than 25 per day, meeting the objective. 

VI. MANAGING PERFORMANCE 

6. 1 Ongoing development 

The original design and implementation of the No. 4 ess are key 
factors in allowing the system to provide the current level of service. 
However, another key ingredient has been the management of No. 4 
ess performance. 

Each service-affecting incident is recorded in a data base and anal- 
ysis is performed monthly to track the overall performance. When 
analysis has shown that specific improvements can help improve 
system performance, they become candidates for features to be devel- 
oped as part of the next generic program release. Committees review 
each new feature candidate for its impact on system resources, the 
development effort required, and the feature's value relative to other 
candidates. The directed Phase 1 was such a feature; it was proposed 
when analysis showed it could reduce system no call processing time. 

6.2 Current engineering 

In addition to new features aimed at improving performance, an 
ongoing effort also exists to identify problems in existing systems and 
to deliver fixes. Specific responsibility for carrying out this effort is 
assigned to a group that works closely with developers to generate the 
necessary fixes. Much of this effort is directed toward the large generic 
program. However, with the rapid introduction of new equipment, all 
modifications to existing hardware designs are also tracked by the 
hardware change committee. 

6.3 Acceptance tests 

In addition to its basic design, No. 4 ess performance is affected by 
how well each new system is installed and how in-service systems are 
operated. New systems must meet rigorous operational readiness tests 
and final verification acceptance tests before they are turned over from 
the installer to the operating telephone company (otc). Before the otc 
places the system in service, it must meet another set of performance 
criteria, of which the 7-day sliding interrupt average is the most visible. 
These performance criteria are specified in Bell System Practices and 
Western Electric Installation Handbooks expressly to help the otcs 
manage the quality of initial service they offer. After initial service, 
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extensive service results performance measurements or indices are 
used to help judge the effectiveness of the team operating each No. 4 

ESS. 

6.4 Managing deployment 

Besides the performance of each individual No. 4 ess, performance 
management has been extended to help govern the rate at which new 
systems are deployed with new software and hardware. Specific rec- 
ommendations have been published in cooperation with AT&T that 
establish intervals after the first application office for subsequent new 
offices and for the beginning of the generic retrofit program. These 
recommendations limit the initial exposure of new software and hard- 
ware until sufficient experience is gained under actual operating con- 
ditions to allow rapid deployment with confidence that service per- 
formance standards will be maintained. 

The recommendations also specify the composition and duties of 
steering and cutover committees for each new system and major 
growth job. Recent experience indicates that these committees can be 
very effective and are key ingredients in the smooth transition from an 
earlier system to a new No. 4 ess. 

As indicated in Section III, there are many demands for changes, 
rearrangements, and additions to existing systems. To help manage 
this high level of activity, as well as arbitrating schedule conflicts for 
new systems, retrofits, and data base updates, an Implementation 
Review Committee was formed with representatives knowledgeable in 
otc needs, Western Electric production and installation capacities, 
and Bell Labs development capabilities and schedules. One of its tasks 
is to help manage peak demands, such as the high fraction of systems 
requesting spring service dates to help meet busy season traffic de- 
mands. 

VII. SUMMARY 

The No. 4 ess has been incorporated successfully into the Bell 
System and international telecommunications network. Since the 
cutover of the first system in Chicago in January 1976, 51 systems 
terminating over 1,000,000 trunks have been put into service. During 
this period, the hardware and software have evolved to include the 
latest technology which has made possible additional equipment cost 
savings and a reduction in space and power requirements. 

Experience with the No. 4 ess has confirmed the original design 
criteria for improved reliability and maintainability in stored pro- 
grammed control systems as follows: 

(i) Reliability, maintainability, and administrative features must 
be included in the original architecture of the entire system. 
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(ii) Software integrity features are necessary to allow large systems 
to perform successfully in an environment containing software prob- 
lems. 

(Hi) Automatic and semiautomatic maintenance aids are mandatory 
for maintaining modern systems. 

(iv) Many factors other than component failures cause reliability 
problems and must be considered in basic design decisions. 

(v) Built-in facilities for continually measuring performance param- 
eters are needed to make sure that performance criteria are met and 
to identify where improvements are required. 

(vi) Performance criteria should be based on customer impact. 

Inclusion of these concepts in the No. 4 ess has been a major factor 
in its excellent performance and rapid deployment. 
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